Search results for "Computer Science - Learning"
showing 10 items of 18 documents
Optimal rates of convergence for persistence diagrams in Topological Data Analysis
2013
Computational topology has recently known an important development toward data analysis, giving birth to the field of topological data analysis. Topological persistence, or persistent homology, appears as a fundamental tool in this field. In this paper, we study topological persistence in general metric spaces, with a statistical approach. We show that the use of persistent homology can be naturally considered in general statistical frameworks and persistence diagrams can be used as statistics with interesting convergence properties. Some numerical experiments are performed in various contexts to illustrate our results.
Remote Sensing Image Classification with Large Scale Gaussian Processes
2017
Current remote sensing image classification problems have to deal with an unprecedented amount of heterogeneous and complex data sources. Upcoming missions will soon provide large data streams that will make land cover/use classification difficult. Machine learning classifiers can help at this, and many methods are currently available. A popular kernel classifier is the Gaussian process classifier (GPC), since it approaches the classification problem with a solid probabilistic treatment, thus yielding confidence intervals for the predictions as well as very competitive results to state-of-the-art neural networks and support vector machines. However, its computational cost is prohibitive for…
A probabilistic estimation and prediction technique for dynamic continuous social science models: The evolution of the attitude of the Basque Country…
2015
In this paper, a computational technique to deal with uncertainty in dynamic continuous models in Social Sciences is presented.Considering data from surveys,the method consists of determining the probability distribution of the survey output and this allows to sample data and fit the model to the sampled data using a goodness-of-fit criterion based the χ2-test. Taking the fitted parameters that were not rejected by the χ2-test, substituting them into the model and computing their outputs, 95% confidence intervals in each time instant capturing the uncertainty of the survey data (probabilistic estimation) is built. Using the same set of obtained model parameters, a prediction over …
Optimized Kernel Entropy Components
2016
This work addresses two main issues of the standard Kernel Entropy Component Analysis (KECA) algorithm: the optimization of the kernel decomposition and the optimization of the Gaussian kernel parameter. KECA roughly reduces to a sorting of the importance of kernel eigenvectors by entropy instead of by variance as in Kernel Principal Components Analysis. In this work, we propose an extension of the KECA method, named Optimized KECA (OKECA), that directly extracts the optimal features retaining most of the data entropy by means of compacting the information in very few features (often in just one or two). The proposed method produces features which have higher expressive power. In particular…
Simplifying Probabilistic Expressions in Causal Inference
2018
Obtaining a non-parametric expression for an interventional distribution is one of the most fundamental tasks in causal inference. Such an expression can be obtained for an identifiable causal effect by an algorithm or by manual application of do-calculus. Often we are left with a complicated expression which can lead to biased or inefficient estimates when missing data or measurement errors are involved. We present an automatic simplification algorithm that seeks to eliminate symbolically unnecessary variables from these expressions by taking advantage of the structure of the underlying graphical model. Our method is applicable to all causal effect formulas and is readily available in the …
Anomaly Detection Framework Using Rule Extraction for Efficient Intrusion Detection
2014
Huge datasets in cyber security, such as network traffic logs, can be analyzed using machine learning and data mining methods. However, the amount of collected data is increasing, which makes analysis more difficult. Many machine learning methods have not been designed for big datasets, and consequently are slow and difficult to understand. We address the issue of efficient network traffic classification by creating an intrusion detection framework that applies dimensionality reduction and conjunctive rule extraction. The system can perform unsupervised anomaly detection and use this information to create conjunctive rules that classify huge amounts of traffic in real time. We test the impl…
Ensembles of Randomized Time Series Shapelets Provide Improved Accuracy while Reducing Computational Costs
2017
Shapelets are discriminative time series subsequences that allow generation of interpretable classification models, which provide faster and generally better classification than the nearest neighbor approach. However, the shapelet discovery process requires the evaluation of all possible subsequences of all time series in the training set, making it extremely computation intensive. Consequently, shapelet discovery for large time series datasets quickly becomes intractable. A number of improvements have been proposed to reduce the training time. These techniques use approximation or discretization and often lead to reduced classification accuracy compared to the exact method. We are proposin…
Renewable Energy Prediction using Weather Forecasts for Optimal Scheduling in HPC Systems
2014
The objective of the GreenPAD project is to use green energy (wind, solar and biomass) for powering data-centers that are used to run HPC jobs. As a part of this it is important to predict the Renewable (Wind) energy for efficient scheduling (executing jobs that require higher energy when there is more green energy available and vice-versa). For predicting the wind energy we first analyze the historical data to find a statistical model that gives relation between wind energy and weather attributes. Then we use this model based on the weather forecast data to predict the green energy availability in the future. Using the green energy prediction obtained from the statistical model we are able…
Forecasting : theory and practice
2022
Forecasting has always been at the forefront of decision making and planning. The uncertainty that surrounds the future is both exciting and challenging, with individuals and organisations seeking to minimise risks and maximise utilities. The large number of forecasting applications calls for a diverse set of forecasting methods to tackle real-life challenges. This article provides a non-systematic review of the theory and the practice of forecasting. We provide an overview of a wide range of theoretical, state-of-the-art models, methods, principles, and approaches to prepare, produce, organise, and evaluate forecasts. We then demonstrate how such theoretical concepts are applied in a varie…
Probabilistic and team PFIN-type learning: General properties
2008
We consider the probability hierarchy for Popperian FINite learning and study the general properties of this hierarchy. We prove that the probability hierarchy is decidable, i.e. there exists an algorithm that receives p_1 and p_2 and answers whether PFIN-type learning with the probability of success p_1 is equivalent to PFIN-type learning with the probability of success p_2. To prove our result, we analyze the topological structure of the probability hierarchy. We prove that it is well-ordered in descending ordering and order-equivalent to ordinal epsilon_0. This shows that the structure of the hierarchy is very complicated. Using similar methods, we also prove that, for PFIN-type learning…